A High-Performance Domain Specific Parallel and Distributed Massive Collection System

نویسندگان

  • Uri Shani
  • Aviad Sela
  • Inna Skarbovsky
چکیده

High performance and ease of use are the two main goals of the Massive Collection System (MCS). On the outset, MCS is a classical process that consumes massive amount of input, processes it according to business specifications, and produces a comparable amount of output. To do that, MCS has a massive parallel architecture whose core processing task executes the business rules on a continuous flux of input records organized in files. Each processing task executes a processing “plan” which is a high level domain specific language (DSL) designed for domain experts rather than professional programmers. The MCS design for performance is composed of two factors: one is the massively parallel execution framework; the second is the effective compilation and execution of the domain specific MCS plans. The execution framework is built on top of IBM J2EE implementation Websphere Application Server (WAS). The entire MCS is a WAS application, written in Java, which obtained its performance goals as well as ease of use. The performance challenges of MCS were stated in terms of hundreds of millions of records a day. We selected Java and WAS for implementation due to their development advantages, allowing us to obtain proofs for the MCS performance goals rather early – within several months, which were shown to scale up almost linearly on the input size.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

Static Task Allocation in Distributed Systems Using Parallel Genetic Algorithm

Over the past two decades, PC speeds have increased from a few instructions per second to several million instructions per second. The tremendous speed of today's networks as well as the increasing need for high-performance systems has made researchers interested in parallel and distributed computing. The rapid growth of distributed systems has led to a variety of problems. Task allocation is a...

متن کامل

Husky: Towards a More Efficient and Expressive Distributed Computing Framework

Finding efficient, expressive and yet intuitive programming models for data-parallel computing system is an important and open problem. Systems like Hadoop and Spark have been widely adopted for massive data processing, as coarse-grained primitives like map and reduce are succinct and easy to master. However, sometimes over-simplified API hinders programmers from more fine-grained control and d...

متن کامل

Green Energy-aware task scheduling using the DVFS technique in Cloud Computing

Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007